Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add ping pong functionality #45

Closed
wants to merge 4 commits into from
Closed

Conversation

c3mb0
Copy link
Contributor

@c3mb0 c3mb0 commented Oct 8, 2016

I have been using PING frames in production for a few months now and it greatly helps with the health of persistent APNs connections. Production clients have not redialed production APNs servers in 1.5 months, and that's only because we deployed around that time. I know this because I've been logging DialTLS. Development clients are another story, since development server connections reset randomly every 1-3 weeks due to Apple's machines moving around and changing IPs.

Keeping a healthy connection also seems decrease the overall amount of network related push errors.

This PR contains a generalized version of the custom solution I've been using. It lacks tests because I am not quite sure how to emulate PONG frames during testing. Any help on that front is appreciated.

It might also make sense to move/use some of this code in ClientManager; That is up for discussion.

@coveralls
Copy link

coveralls commented Oct 8, 2016

Coverage Status

Coverage decreased (-12.2%) to 77.483% when pulling 1f4abb9 on c3mb0:ping-pong into e42f05d on sideshow:master.

@coveralls
Copy link

coveralls commented Oct 9, 2016

Coverage Status

Coverage increased (+0.2%) to 89.845% when pulling 0785fde on c3mb0:ping-pong into e42f05d on sideshow:master.

@coveralls
Copy link

coveralls commented Oct 9, 2016

Coverage Status

Coverage increased (+0.2%) to 89.845% when pulling a877f6a on c3mb0:ping-pong into e42f05d on sideshow:master.

@c3mb0
Copy link
Contributor Author

c3mb0 commented Oct 9, 2016

Oh silly me, we don't actually need to emulate PONG frames (since they are not read), being able to successfully send PING frames should suffice.

defer client.m.Unlock()
client.conn = c
if client.pinging {
client.newConnChan <- struct{}{}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's blocks forever if we never call EnablePinging. Is it ok here? (Maybe I miss some context, sorry.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shouldn't since when pinging is false, the function won't try to send a signal to newConnChan, and releases the lock when it returns. When pinging is true, there is a goroutine that reads from newConnChan, so writing to it should never block. Am I missing something?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see. Sorry again :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

S'all good, the more eyes the better!

@sideshow
Copy link
Owner

Hey @c3mb0 - Cheers for the pull request. A few questions;

  1. I noted ping Frames have just been added to the golang http2 transport golang/net@ed0556c (background x/net/http2: no public method to send PING frames golang/go#15475) . Would it make sense to use these now, as it would clean up that part of the code by not having to construct the frame?
  2. Is there a better solution to get a handle on the connection rather than just holding on to the last connection you create through DialTLS. Ie, ConnPool field on the http2 transport - ClientConnPool interface has a GetClientConn method

@c3mb0
Copy link
Contributor Author

c3mb0 commented Oct 24, 2016

Great catch @sideshow, I did not realize this was added to the http2 package.

Correct me if I'm wrong, but it seems like GetClientConn is designed to be a server side function, since it takes *http.Request as one of the parameters. However, there is another function called NewClientConn available to *http2.Transport that takes in a net.Conn and returns a ClientConn. It seems suitable for our use case.

Using this approach could clean up the code a bit, but not my much: After taking a quick peek, from what I can tell, we would still need to create a new ClientConn every time DialTLS happens, and hold onto it just like we do with net.Conn. However, the pinging portion would be "official", which seems like a better practice. I will do some extended tests and update the PR accordingly.

Kind of off-topic, but an alternative approach to pinging would be to handle all the pinging transparently within NewClient, however this approach removes the opt-in nature of the current implementation and might be considered an API breaking change since people updating the library could end up with ticker leaks. It also adds an additional step (closing the pinger ticker) for people who would like to use this library for a push every now and then. It's just one line, but it's something to consider. On the other hand, the code becomes much simpler and cleaner:

func NewClient(certificate tls.Certificate) (client *Client) {
    tlsConfig := &tls.Config{
        Certificates: []tls.Certificate{certificate},
    }
    if len(certificate.Certificate) > 0 {
        tlsConfig.BuildNameToCertificate()
    }
    quitChan := make(chan struct{})
    ccChan := make(chan *http2.ClientConn)
    go func() {
        pinger := new(time.Ticker)
        ctx, cancelFunc := context.WithCancel(context.Background())
        var cc *http2.ClientConn
        for {
            select {
            case <-pinger.C:
                err := cc.Ping(ctx)
                if err != nil {
                    pinger.Stop()
                }
            case cc = <-ccChan:
                pinger.Stop()
                pinger = time.NewTicker(PingPongFrequency)
            case <-quitChan:
                defer cancelFunc()
                pinger.Stop()
                return
            }
        }
    }()
    transport := &http2.Transport{TLSClientConfig: tlsConfig}
    customDial := func(network, addr string, cfg *tls.Config) (conn net.Conn, err error) {
        conn, err = tls.DialWithDialer(&net.Dialer{Timeout: TLSDialTimeout}, network, addr, cfg)
        if err == nil {
            cc, err := transport.NewClientConn(conn)
            if err == nil {
                ccChan <- cc
            }
        }
        return
    }
    transport.DialTLS = customDial
    return &Client{
        HTTPClient: &http.Client{
            Transport: transport,
            Timeout:   HTTPClientTimeout,
        },
        Certificate: certificate,
        Host:        DefaultHost,
        quitChan:    quitChan,
    }
}

I haven't tested the code above, it' just to show how shorter and sweeter the non opt-in approach is. The breaking change could be mitigated via gopkg.in. What do you think?

}
go func() {
// 8 bytes of random data used for PING-PONG, as per HTTP/2 spec.
data := [8]byte{byte(rand.Intn(256)), byte(rand.Intn(256)), byte(rand.Intn(256)), byte(rand.Intn(256)), byte(rand.Intn(256)), byte(rand.Intn(256)), byte(rand.Intn(256)), byte(rand.Intn(256))}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would perhaps look a bit cleaner:

var data [8]byte
rand.Reader.Read(data[:])

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, I will change and update the PR.

}
case <-c.newConnChan:
c.m.Lock()
framer = http2.NewFramer(c.conn, c.conn)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't the transport technically dial multiple connections? I guess in practice it doesn't, but is there anything in place stopping it from doing so?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The source code and many production tests reveal that it does not.

@coveralls
Copy link

coveralls commented Dec 5, 2016

Coverage Status

Coverage increased (+0.3%) to 89.956% when pulling 6094d0e on c3mb0:ping-pong into e42f05d on sideshow:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.3%) to 89.956% when pulling f0f0e42 on c3mb0:ping-pong into 734432d on sideshow:master.

1 similar comment
@coveralls
Copy link

coveralls commented Dec 5, 2016

Coverage Status

Coverage increased (+0.3%) to 89.956% when pulling f0f0e42 on c3mb0:ping-pong into 734432d on sideshow:master.

@sideshow
Copy link
Owner

sideshow commented Jan 3, 2017

@c3mb0 Apologies for the long delay on reviewing this.

  • The problem is that the go HttpClient doesn't just use one connection, but rather has an underlying connection pool. The ConnPool manages a pool of HTTP/2 client connections - You are only holding on to the last connection that was made to the server, and sending ping frames to that. So if you have 10 say connections because you have been smashing APNs and are only pinging the last connection its not going to fully solve the problem. This may be working in your scenario because you aren't sending enough to cause the client connection pool to open more than one connection.

  • For the same reason above this code would make more sense at the transport level, or in a custom client connection pool level rather than in the Client level. As you mentioned it would be ale to also help with issue add ping pong functionality #45.

@sideshow sideshow mentioned this pull request Jan 3, 2017
@nvartolomei
Copy link

@sideshow are you familiar enough with go http2 internals to say in which case it will open multiple http2 connections to oner server? In most of the cases you will have a single connection, except the case when you are doing more than maxConcurrentStreams pushes, but in this case you don't need ping.

If I got it right, the ping is only needed when you have times without pushing, but you want to have low latency when you start sending notifications again, this will ensure that at least one connection is alive and more will be started if needed.

What do you think?

@sideshow
Copy link
Owner

sideshow commented Aug 23, 2017

@nvartolomei Yes it will open new connections when you are doing more than maxConcurrentStreams. Existing connections remain in the pool unless the connection is closed, or they are closed as idle internally or the client is sent a goaway frame.

In theory you are correct about only needing ping frames for low latency when pushing. If the connection becomes stale the server should send goaway frames, and the client should know to close the stream. However, judging by these two pull requests and the issue that was created, the http2 transport was failing to do this. I would say this was potentially just a misunderstanding between client and server, most likely because the http2 specification is still new.

Im not sure if the modifications to the go transport layer since this have solved this issue. In any case, the correct place to send ping frames would be the connection pool and not the http client. This is the only place where you have a handle on all the currently active connections and not just the last one.

I'm closing this as stale, but have logged an issue/enhancement ticket around creating a better connection pool which could have this functionality #88

@sideshow sideshow closed this Aug 23, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants